Introduction This analysis runs a simple procedure that identifies variables significantly different across sample groups via ANOVA and then plots a heatmap of these variables.

 

Go to project home

1 Description

1.1 Project

Comparison between cell lines from 9 different cancer tissues (NCI-60); GSE5949

1.2 PubMed

Reinhold WC, Reimers MA, Lorenzi P, Ho J et al. Multifactorial regulation of E-cadherin expression: an integrative study. Mol Cancer Ther 2010 Jan;9(1):1-16. PMID: 20053763.

1.3 Experimental design

Comparison between cell lines from 9 different cancer tissue of origin types (Breast, Central Nervous System, Colon, Leukemia, Melanoma, Non-Small Cell Lung, Ovarian, Prostate, Renal) from NCI-60 panel

1.4 Analysis

Cluster genes co-expressed acroo 9 tissues/organs

2 Summary statistics

The input data matrix has

  • 9 sample groups
  • 60 total samples
  • 3105 total variables
    • there are no variables including missing values.

Table 1. The mean, standard deviation, and range of all variables.

Min. 1st Qu. Median Mean 3rd Qu. Max.
Mean 2.2745000 4.3560000 5.6256667 5.6818881 6.8808333 12.162667
SD 0.5000402 0.5625195 0.6536183 0.7684433 0.8506456 3.108248
Range 2.0100000 2.7500000 3.2900000 3.6183349 4.2000000 8.840000
Go to project home

3 Variable selection

3.1 Run ANOVA

Run 1-way ANOVA on each variable to identify those significantly different across all sample groups.

Figure 1. Distribution of ANOVA p values. Number of variables with p values within each 0.01 interval.

3.2 Select variables

Significant variables were selected using the following criteria:

  • Select variables with ANOVA p values less than 10^{-5}
  • Stop if the number of remaining variables is between 100 and 2000, else
    • if the number remaining variable is less than 100, select the top 100 variables with the smallest p values
    • if the number remaining variable is greater than 2000, select the top 2000 variables with the smallest p values

As a result, 396 variables were selected. Click here to view these variables.

4 Heatmap

Figure 2. Color-coded data of selected variables different across sample groups (red = higher). Variables (rows) were clustered based on their correlation to each other and samples were arranged by groups.

Go to project home

5 Appendix

Check out the RoCA home page for more information.

5.1 Reproduce this report

To reproduce this report:

  1. Find the data analysis template you want to use and an example of its pairing YAML file here and download the YAML example to your working directory

  2. To generate a new report using your own input data and parameter, edit the following items in the YAML file:

    • output : where you want to put the output files
    • home : the URL if you have a home page for your project
    • analyst : your name
    • description : background information about your project, analysis, etc.
    • input : where are your input data, read instruction for preparing them
    • parameter : parameters for this analysis; read instruction about how to prepare input data
  3. Run the code below within R Console or RStudio, preferablly with a new R session:

if (!require(devtools)) { install.packages('devtools'); require(devtools); }
if (!require(RCurl)) { install.packages('RCurl'); require(RCurl); }
if (!require(RoCA)) { install_github('zhezhangsh/RoCAR'); require(RoCA); }

CreateReport(filename.yaml);  # filename.yaml is the YAML file you just downloaded and edited for your analysis

If there is no complaint, go to the output folder and open the index.html file to view report.

5.2 Session information

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gplots_3.0.1        htmlwidgets_1.3     DT_0.5             
##  [4] RoCA_0.0.0.9000     awsomics_0.0.0.9000 RCurl_1.95-4.11    
##  [7] bitops_1.0-6        usethis_1.4.0       devtools_2.0.1     
## [10] yaml_2.2.0          rmarkdown_1.10      knitr_1.20         
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0         later_0.7.5        highr_0.7         
##  [4] compiler_3.5.1     prettyunits_1.0.2  base64enc_0.1-3   
##  [7] remotes_2.0.2      tools_3.5.1        digest_0.6.18     
## [10] pkgbuild_1.0.2     pkgload_1.0.2      jsonlite_1.5      
## [13] evaluate_0.12      memoise_1.1.0      rlang_0.3.0.1     
## [16] shiny_1.2.0        cli_1.0.1          rstudioapi_0.8    
## [19] crosstalk_1.0.0    withr_2.1.2        stringr_1.3.1     
## [22] caTools_1.17.1.1   gtools_3.8.1       desc_1.2.0        
## [25] fs_1.2.6           rprojroot_1.3-2    glue_1.3.0        
## [28] R6_2.3.0           processx_3.2.0     sessioninfo_1.1.1 
## [31] gdata_2.18.0       callr_3.0.0        magrittr_1.5      
## [34] promises_1.0.1     backports_1.1.2    ps_1.2.1          
## [37] htmltools_0.3.6    assertthat_0.2.0   xtable_1.8-3      
## [40] mime_0.6           httpuv_1.4.5       KernSmooth_2.23-15
## [43] stringi_1.2.4      crayon_1.3.4

END OF DOCUMENT